# Data Retrieval 
This notebook describes the process that can be used to retrieve the data independently for the Earthscope short course on MsPASS held in July 2024.  The procedures here are useful for retrieving small data sets via web services with MsPASS, but are expected to become archaic when the new Earthscope cloud system is fully functional.  

## Retrieval with MsPASS
If you are running MsPASS in the normal with the docker container or using the anaconda package the following script can be used.  It uses ObsPy's web service client to retrieve a set of QuakeML format data from IRIS and packages them into what they call a "Catalog" object.  We then use a method of the MsPASS Database class to save that data to MongoDB. Finally, we retrieve that data sorted to time order and write load the result into a pandas DataFrame we assign the symbol "df".  

In [None]:
from obspy import UTCDateTime
from obspy.clients.fdsn import Client
client=Client("IRIS")
ts=UTCDateTime('2011-01-01T00:00:00.0')
starttime=ts
te=UTCDateTime('2012-01-01T00:00:00.0')
endtime=te
lat0=38.3
lon0=142.5
minmag=7.0

cat=client.get_events(starttime=starttime,endtime=endtime,
        minmagnitude=minmag)
# this is a weird incantation suggested by obspy to print a summeary of all the events
print(cat.__str__(print_all=True))

In [None]:
from mspasspy.db.database import Database   # This isn't strictly needed but used here because db set below is an instance of this class
import mspasspy.client as msc
dbclient=msc.Client()
db = dbclient.get_database('scoped2024')
n=db.save_catalog(cat)
print('number of event entries saved in source collection=',n)

In [None]:
import pandas as pd
# We need only these basics to compare to previous output as a cross check
projection={
    "time":1,
    "lat":1,
    "lon":1,
    "depth":1,
    "magnitude":1,
}
cursor=db.source.find({},projection).sort("time")
doclist=[]
for doc in cursor:
    doclist.append(doc)
df = pd.DataFrame(doclist)
print(df)

In [None]:
csvfilename="workshop_sources.csv"
df.to_csv(csvfilename)

## Read CSV file to create/recreate DataFrame
Alternatively if you are running this without an instance of MongoDB available and were supplied a copy of the csv file created immediately above, you can just load that data file to create the dataframe with the next box.  If you run this notebook from start to finish this next box is redundant, but it makes this notebook stateless.

In [None]:
# these two lines are repeated to allow start here instead of at the top
import pandas as pd
csvfilename="workshop_sources.csv"
df = pd.read_csv(csvfilename)
print(df)

## Waveform Retrieval
Finally, we retrieve the waveform data with obspy's get_waveforms gizmo and save the results as a set of miniseed files in a directory we create with the name "./wf".  

Note when I ran this with a fairly standard "high speed internet" connection it took just under 2 hours.   

In [None]:
import os
import time
from obspy.clients.fdsn import RoutingClient
client = RoutingClient("iris-federator")

t0=time.time()
outdir = "./wf"
# obpsy's writer does not create a directory if it doesn't exist
# this standard incantation does that
wfdir_exists = os.path.exists(outdir)
if not wfdir_exists:
    os.makedirs(outdir)
    print("Output directory = ",outdir," did not exists and was created")
start_offset=300.0
end_offset=45*60.0
i=0
for origin_time in df["time"]: 
    print('Starting to retrieve data for event number',i,' with origin time=',UTCDateTime(origin_time))
    stime=origin_time+start_offset
    etime=origin_time+end_offset
    strm=client.get_waveforms(
            starttime=UTCDateTime(stime),
            endtime=UTCDateTime(etime),
            network='TA',
            channel='BH?',
            location='*'
        )
    fname = outdir + "/Event_{}.msd".format(i)
    print('writing miniseed format data to file=',fname)
    strm.write(fname,format='MSEED')
    i += 1

print('Number of waveforms saved=',i)
t = time.time()
print('Time required for download=',t-t0)