# Data Retrieval 
This notebook describes the process that can be used to retrieve the data independently for the Earthscope short course on MsPASS held in July 2024.  The procedures here are useful for retrieving small data sets via web services with MsPASS, but are expected to become archaic when the new Earthscope cloud system is fully functional.  

## Retrieval with MsPASS
If you are running MsPASS in the normal with the docker container or using the anaconda package the following script can be used.  It uses ObsPy's web service client to retrieve a set of QuakeML format data from IRIS and packages them into what they call a "Catalog" object.  We then use a method of the MsPASS Database class to save that data to MongoDB. Finally, we retrieve that data sorted to time order and write load the result into a pandas DataFrame we assign the symbol "df".  

In [1]:
from obspy import UTCDateTime
from obspy.clients.fdsn import Client
client=Client("IRIS")
ts=UTCDateTime('2011-01-01T00:00:00.0')
starttime=ts
te=UTCDateTime('2012-01-01T00:00:00.0')
endtime=te
lat0=38.3
lon0=142.5
minmag=7.0

cat=client.get_events(starttime=starttime,endtime=endtime,
        minmagnitude=minmag)
# this is a weird incantation suggested by obspy to print a summeary of all the events
print(cat.__str__(print_all=True))

20 Event(s) in Catalog:
2011-12-14T05:04:57.810000Z |  -7.528, +146.814 | 7.1  MW
2011-10-28T18:54:34.750000Z | -14.557,  -76.121 | 7.0  MW
2011-10-23T10:41:22.010000Z | +38.729,  +43.447 | 7.1  MW
2011-10-21T17:57:17.310000Z | -28.881, -176.033 | 7.4  MW
2011-09-15T19:31:03.160000Z | -21.593, -179.324 | 7.3  MW
2011-09-03T22:55:35.760000Z | -20.628, +169.778 | 7.0  MW
2011-08-24T17:46:11.560000Z |  -7.620,  -74.538 | 7.0  MW
2011-08-20T18:19:24.610000Z | -18.331, +168.226 | 7.0  MW
2011-08-20T16:55:04.090000Z | -18.277, +168.067 | 7.1  MW
2011-07-10T00:57:10.910000Z | +38.055, +143.302 | 7.0  MW
2011-07-06T19:03:20.470000Z | -29.307, -176.257 | 7.6  MW
2011-06-24T03:09:38.920000Z | +51.980, -171.820 | 7.3  MW
2011-04-07T14:32:44.100000Z | +38.251, +141.730 | 7.1  MW
2011-03-11T06:25:50.740000Z | +38.051, +144.630 | 7.6  MW
2011-03-11T06:15:37.570000Z | +36.227, +141.088 | 7.9  MW
2011-03-11T05:46:23.200000Z | +38.296, +142.498 | 9.1  MW
2011-03-09T02:45:19.590000Z | +38.441, +142.980 

In [2]:
from mspasspy.db.database import Database   # This isn't strictly needed but used here because db set below is an instance of this class
import mspasspy.client as msc
dbclient=msc.Client()
db = dbclient.get_database('scoped2024')
n=db.save_catalog(cat)
print('number of event entries saved in source collection=',n)

number of event entries saved in source collection= 20


In [3]:
import pandas as pd
# We need only these basics to compare to previous output as a cross check
projection={
    "time":1,
    "lat":1,
    "lon":1,
    "depth":1,
    "magnitude":1,
}
cursor=db.source.find({},projection).sort("time")
doclist=[]
for doc in cursor:
    doclist.append(doc)
df = pd.DataFrame(doclist)
print(df)

                         _id      lat       lon  depth          time  \
0   6701261c1ccaaad58a479002 -26.8513  -63.2373  584.3  1.293876e+09   
1   6701261c1ccaaad58a479001 -38.3907  -73.3993   24.4  1.294000e+09   
2   6701261c1ccaaad58a479000  28.6831   63.9948   79.9  1.295382e+09   
3   6701261c1ccaaad58a478fff  38.4407  142.9803   26.2  1.299639e+09   
4   6701261c1ccaaad58a478ffe  38.2963  142.4980   19.7  1.299822e+09   
5   6701261c1ccaaad58a478ffd  36.2274  141.0880   25.4  1.299824e+09   
6   6701261c1ccaaad58a478ffc  38.0510  144.6297   19.8  1.299825e+09   
7   6701261c1ccaaad58a478ffb  38.2513  141.7296   53.2  1.302187e+09   
8   6701261c1ccaaad58a478ffa  51.9805 -171.8201   49.8  1.308885e+09   
9   6701261c1ccaaad58a478ff9 -29.3073 -176.2572   25.4  1.309979e+09   
10  6701261c1ccaaad58a478ff8  38.0553  143.3016   24.7  1.310259e+09   
11  6701261c1ccaaad58a478ff7 -18.2774  168.0670   34.6  1.313859e+09   
12  6701261c1ccaaad58a478ff6 -18.3312  168.2258   31.5  1.313864

In [4]:
csvfilename="workshop_sources.csv"
df.to_csv(csvfilename)

## Read CSV file to create/recreate DataFrame
Alternatively if you are running this without an instance of MongoDB available and were supplied a copy of the csv file created immediately above, you can just load that data file to create the dataframe with the next box.  If you run this notebook from start to finish this next box is redundant, but it makes this notebook stateless.

In [5]:
# these two lines are repeated to allow start here instead of at the top
import pandas as pd
csvfilename="workshop_sources.csv"
df = pd.read_csv(csvfilename)
print(df)

    Unnamed: 0                       _id      lat       lon  depth  \
0            0  6701261c1ccaaad58a479002 -26.8513  -63.2373  584.3   
1            1  6701261c1ccaaad58a479001 -38.3907  -73.3993   24.4   
2            2  6701261c1ccaaad58a479000  28.6831   63.9948   79.9   
3            3  6701261c1ccaaad58a478fff  38.4407  142.9803   26.2   
4            4  6701261c1ccaaad58a478ffe  38.2963  142.4980   19.7   
5            5  6701261c1ccaaad58a478ffd  36.2274  141.0880   25.4   
6            6  6701261c1ccaaad58a478ffc  38.0510  144.6297   19.8   
7            7  6701261c1ccaaad58a478ffb  38.2513  141.7296   53.2   
8            8  6701261c1ccaaad58a478ffa  51.9805 -171.8201   49.8   
9            9  6701261c1ccaaad58a478ff9 -29.3073 -176.2572   25.4   
10          10  6701261c1ccaaad58a478ff8  38.0553  143.3016   24.7   
11          11  6701261c1ccaaad58a478ff7 -18.2774  168.0670   34.6   
12          12  6701261c1ccaaad58a478ff6 -18.3312  168.2258   31.5   
13          13  6701

## Waveform Retrieval
Finally, we retrieve the waveform data with obspy's get_waveforms gizmo and save the results as a set of miniseed files in a directory we create with the name "./wf".  

Note when I ran this with a fairly standard "high speed internet" connection it took just under 2 hours.   

In [6]:
import os
import time
from obspy.clients.fdsn import RoutingClient
client = RoutingClient("iris-federator")

t0=time.time()
outdir = "./wf"
# obpsy's writer does not create a directory if it doesn't exist
# this standard incantation does that
wfdir_exists = os.path.exists(outdir)
if not wfdir_exists:
    os.makedirs(outdir)
    print("Output directory = ",outdir," did not exists and was created")
start_offset=300.0
end_offset=45*60.0
i=0
for origin_time in df["time"]: 
    print('Starting to retrieve data for event number',i,' with origin time=',UTCDateTime(origin_time))
    stime=origin_time+start_offset
    etime=origin_time+end_offset
    strm=client.get_waveforms(
            starttime=UTCDateTime(stime),
            endtime=UTCDateTime(etime),
            network='TA',
            channel='BH?',
            location='*'
        )
    fname = outdir + "/Event_{}.msd".format(i)
    print('writing miniseed format data to file=',fname)
    strm.write(fname,format='MSEED')
    i += 1

print('Number of waveforms saved=',i)
t = time.time()
print('Time required for download=',t-t0)

Output directory =  ./wf  did not exists and was created
Starting to retrieve data for event number 0  with origin time= 2011-01-01T09:56:58.460000Z
writing miniseed format data to file= ./wf/Event_0.msd


This might have a negative influence on the compatibility with other programs.


Starting to retrieve data for event number 1  with origin time= 2011-01-02T20:20:18.170000Z
writing miniseed format data to file= ./wf/Event_1.msd
Starting to retrieve data for event number 2  with origin time= 2011-01-18T20:23:25.570000Z
writing miniseed format data to file= ./wf/Event_2.msd
Starting to retrieve data for event number 3  with origin time= 2011-03-09T02:45:19.590000Z
writing miniseed format data to file= ./wf/Event_3.msd
Starting to retrieve data for event number 4  with origin time= 2011-03-11T05:46:23.200000Z
writing miniseed format data to file= ./wf/Event_4.msd
Starting to retrieve data for event number 5  with origin time= 2011-03-11T06:15:37.570000Z
writing miniseed format data to file= ./wf/Event_5.msd
Starting to retrieve data for event number 6  with origin time= 2011-03-11T06:25:50.740000Z
writing miniseed format data to file= ./wf/Event_6.msd
Starting to retrieve data for event number 7  with origin time= 2011-04-07T14:32:44.100000Z
writing miniseed format da