# Parallel Processing
This workflow is a good prototype for something that does something useful and demonstrates mspass in a parallel setting.  

In [1]:
from mspasspy.client import Client
mp_client = Client()
db = mp_client.get_database('SG2021')

## Prepare functions that will be used in the workflow

### arrival_slowness_vector
Given an obspy arrival array member return the mspass SlownessVector.

Obspy's taup calculator returns travel time data as a list with one class member for 
each seismic phase. Inside that thing is a ray parameter, which is slowness in sec/degree.  
A slowness vector has direction so we need to compute the direction from the azimuth.

In [2]:
from obspy.geodetics import degrees2kilometers
import math
from mspasspy.ccore.seismic import SlownessVector

def arrival_slowness_vector(obspy_arrival,azimuth=0.0):
    """
    :param obspy_arrival: list member for which the slowness vector is to be computed.
    :param azimuth:  azimuth in degrees of propagation direction at receiver. 
    
    :return: SlownessVector form model estimate for this phase.
    
    """
    # theta is the standard angle in math definition of polar coordinate angle (degrees)
    theta=90.0-azimuth
    rtheta=math.radians(theta)   # radians needed for math calculations
    p=obspy_arrival.ray_param_sec_degree
    u=p/degrees2kilometers(1.0)
    ux=u*math.cos(rtheta)
    uy=u*math.sin(rtheta)
    return SlownessVector(ux,uy,0.0)

### set_P_time
Sets a predicted P wave arrival time using source and receiver coordinates and 
    model passed as a parameter and time shifts data so time 0 is the predicted P wave arrival time.

In [3]:
from mspasspy.util.decorators import mspass_func_wrapper
# We need this function to handle setting arrival times.
@mspass_func_wrapper
def set_P_time(d,model):
    stalat=d['site_lat']
    stalon=d['site_lon']
    srclat=d['source_lat']
    srclon=d['source_lon']
    depth=d['source_depth']
    otime=d['source_time']
    georesult=gps2dist_azimuth(srclat,srclon,stalat,stalon)
    # obspy's function we just called returns distance in m in element 0 of a tuple
    # With their travel time calculator it is degrees so we need this conversion
    dist=kilometers2degrees(georesult[0]/1000.0)
    baz=georesult[2]  # gps2dist_azimuth returns back azimuth as 2 of tuple.  We need azimuth
    azimuth=baz+180.0
    if azimuth>360.0:
        azimuth -= 360.0
    # the taup calculator fails if we ask for P in the core shadow.  This is a rough 
    # way to handle this for this example that works for the one event we are processing here
    # A more elegant method would worry about source depth
    if dist>95.0:
        d.kill()
        d.elog.log_error('session2_RF_script','No P wave - station is in the core shadow',
                         ErrorSeverity.Invalid)
    else:
        arrivals=model.get_travel_times(source_depth_in_km=depth,distance_in_degree=dist,phase_list=['P'])
        # Arrivals are returned in time order 0 is always the first arrival
        # This computes arrival time as an epoch time and shifts the data to put 0 at that time
        a=arrivals[0]
        atime=a.time
        # Post the time used to Metadata
        d['P_iasp91']=atime   # Illustrates a made up key for Metadata
        d.ator(otime+atime)
        # We also post the slowness data - computed by this function
        u=arrival_slowness_vector(a,azimuth)
        d['ux']=u.ux
        d['uy']=u.uy
    return d

### apply_free_surface_transformation
Computes and applies the Kennett [1991] free surface transformation matrix.

Kennett [1991] gives the form for a free surface transformation operator
that reduces to a nonorthogonal transformation matrix when the wavefield is
not evanescent.  On output x1 will be transverse, x2 will be SV (radial),
and x3 will be longitudinal.

In [4]:
@mspass_func_wrapper
def apply_free_surface_transformation(d,vp0,vs0):
    """
    Thin wrapper for free_surface_transformation method of Seismogram that assumes
    the components of a slowness vector for the transformation are in the Metadata 
    of d stored with the keys ux and uy
    """
    if d.dead():
        return d
    if 'ux' in d and 'uy' in d:
        ux=d['ux']
        uy=d['uy']
        u = SlownessVector(ux,uy,0.0)
        d.free_surface_transformation(u,vp0,vs0)
    else:
        d.elog.log_error('session2_RF_script','Slowness vector components were not set',
                         ErrorSeverity.Invalid)
        d.kill()
    return d

### More functions
More functions can be found in our source code

## RF Estimation workflow:  Serial version
Above we assembled data into Seismogram objects and saved them to the database.  In this example workflow we will generate a set of receiver function estimates driven by Seismogram inputs.  The serial job is a data driven loop over all Seismogram objects stored in the database.  For each seismogram we will do the following calculations:
1.  Detrend the data (for a Seismogram that means channel by channel)
2.  Lightly taper the ends to reduce filter startup transients.
3.  Bandpass filter the data.
4.  Window the data around the P wave arrival time.
5.  Run the deconvolution algorithm.
6.  Save the results.

### parameter setting
MsPASS allows parameters to be placed in a Antelope Pf format file.  We use that here as an example of how to put parameters for a workflow in one place

When using a pf to define constants always do that up front in case there are errors in the file

Example: session2.pf

data_taper_length 10.0 \
filter_high_corner 2.0 \
filter_low_corner 0.02 \
analysis_window_starttime -200.0 \
analysis_window_endtime 200.0 \
vp0 6.0 \
vs0 3.5

In [5]:
import time
from mspasspy.algorithms.RFdeconProcessor import RFdeconProcessor
from mspasspy.algorithms.RFdeconProcessor import RFdecon
from mspasspy.ccore.utility import AntelopePf
from mspasspy.algorithms.window import WindowData
from mspasspy.algorithms.signals import (filter, detrend)
from mspasspy.ccore.algorithms.basic import TimeWindow,CosineTaper
from mspasspy.ccore.utility import ErrorSeverity
from obspy.taup import TauPyModel
model = TauPyModel(model="iasp91")
from obspy.geodetics import gps2dist_azimuth,kilometers2degrees

pfhandle=AntelopePf('session2.pf')

dtaperlength=pfhandle.get_double("data_taper_length")
fmax=pfhandle.get_double("filter_high_corner")
fmin=pfhandle.get_double("filter_low_corner")
awin_start=pfhandle.get_double("analysis_window_starttime")
awin_end=pfhandle.get_double("analysis_window_endtime")
vp0=pfhandle.get_double('vp0')
vs0=pfhandle.get_double('vs0')

### Obtain a record in wf_Seismogram collection

In [6]:
# the size of input seismograms
doc = db.wf_Seismogram.find_one({})

In [7]:
# see what it looks like
print(doc)

{'_id': ObjectId('610d546d5719f54c84c12904'), 'cardinal': False, 'delta': 0.025, 'hang': 0.0, 'nbytes': 143360, 'site_id': ObjectId('600fff404b4f9e654b4dd645'), 'dfile': 'file5', 'orthogonal': False, 'channel_endtime': 1367193599.0, 'vang': 0.0, 'channel_elev': 0.185, 'chan': 'BHZ', 'sampling_rate': 40.0, 'channel_starttime': 1349740800.0, 'channel_lon': -90.571503, 'source_id': ObjectId('61076db5ad4e0df4015f547c'), 'channel_edepth': 0.0, 'last_packet_time': 1356825909.865, 'npts': 24001, 'channel_id': ObjectId('600fff404b4f9e654b4dd647'), 'foff': 0, 'starttime': 1356822806.9258306, 'tmatrix': [0.0, 0.0, 1.0, 2.6484540326036093e-14, 1.0, 2.6484540326036093e-14, 1.0, 0.0, 2.6484540326036093e-14], 'time_standard': 'UTC', 'utc_convertible': True, 'dir': '/tmp/data_files', 'channel_lat': 37.361099, 'storage_mode': 'file', 'history_object_id': 'bd660c4f-f5e3-4a4a-8511-cc0f49196ada', 'data_tag': 'rawdata'}


### Read the record and return a Seismogram class object

In [8]:
normlist=['source','site']
d = db.read_data(doc,collection='wf_Seismogram',normalize=normlist)

### Detrend the Seismogram

In [9]:
detrend(d)

Seismogram({'_id': ObjectId('610d546d5719f54c84c12904'), 'calib': 1.000000, 'cardinal': False, 'chan': 'E', 'channel_edepth': 0.000000, 'channel_elev': 0.185000, 'channel_endtime': 1367193599.000000, 'channel_id': ObjectId('600fff404b4f9e654b4dd647'), 'channel_lat': 37.361099, 'channel_lon': -90.571503, 'channel_starttime': 1349740800.000000, 'data_tag': 'rawdata', 'delta': 0.025000, 'dfile': 'file5', 'dir': '/tmp/data_files', 'endtime': 1356823406.925831, 'foff': 0, 'hang': 0.000000, 'history_object_id': 'bd660c4f-f5e3-4a4a-8511-cc0f49196ada', 'last_packet_time': 1356825909.865000, 'loc': '', 'nbytes': 143360, 'net': 'ZL', 'npts': 24001, 'orthogonal': False, 'processing': ["ObsPy 1.2.2: detrend(options={}::type='simple')"], 'sampling_rate': 40.000000, 'site_elev': 0.185000, 'site_endtime': 1367193599.000000, 'site_id': ObjectId('600fff404b4f9e654b4dd645'), 'site_lat': 37.361099, 'site_lon': -90.571503, 'site_starttime': 1349740800.000000, 'source_depth': 32.800000, 'source_id': Object

### Use CosineTaper to taper the Seismogram

In [10]:
dtaper = CosineTaper(d.t0, d.t0+dtaperlength, d.endtime()-dtaperlength, d.endtime())
dtaper.apply(d)

0

### Bandpass filtering the Seismogram

In [11]:
filter(d,'bandpass',freqmax=fmax,freqmin=fmin)

Seismogram({'_id': ObjectId('610d546d5719f54c84c12904'), 'calib': 1.000000, 'cardinal': False, 'chan': 'E', 'channel_edepth': 0.000000, 'channel_elev': 0.185000, 'channel_endtime': 1367193599.000000, 'channel_id': ObjectId('600fff404b4f9e654b4dd647'), 'channel_lat': 37.361099, 'channel_lon': -90.571503, 'channel_starttime': 1349740800.000000, 'data_tag': 'rawdata', 'delta': 0.025000, 'dfile': 'file5', 'dir': '/tmp/data_files', 'endtime': 1356823406.925831, 'foff': 0, 'hang': 0.000000, 'history_object_id': 'bd660c4f-f5e3-4a4a-8511-cc0f49196ada', 'last_packet_time': 1356825909.865000, 'loc': '', 'nbytes': 143360, 'net': 'ZL', 'npts': 24001, 'orthogonal': False, 'processing': ["ObsPy 1.2.2: detrend(options={}::type='simple')", "ObsPy 1.2.2: filter(options={'freqmax': 2.0, 'freqmin': 0.02}::type='bandpass')", "ObsPy 1.2.2: filter(options={'freqmax': 2.0, 'freqmin': 0.02}::type='bandpass')", "ObsPy 1.2.2: filter(options={'freqmax': 2.0, 'freqmin': 0.02}::type='bandpass')"], 'sampling_rate':

### Window the Seismogram

1. compute delta and azimuth

In [12]:
stalat=d['site_lat']
stalon=d['site_lon']
srclat=d['source_lat']
srclon=d['source_lon']
depth=d['source_depth']
otime=d['source_time']

georesult=gps2dist_azimuth(srclat,srclon,stalat,stalon)
# obspy's function we just called returns distance in m in element 0 of a tuple
# their travel time calculator it is degrees so we need this conversion
dist=kilometers2degrees(georesult[0]/1000.0)
baz=georesult[2]  # gps2dist_azimuth returns back azimuth as 2 of tuple.  We need azimuth
azimuth=baz+180.0
if azimuth>360.0:
    azimuth -= 360.0
print('delta=',dist,' azimuth=',azimuth)

delta= 91.72518858842062  azimuth= 141.11883513432912


2. Compute arrival time and shift t0 to P wave arrival time

In [13]:
arrivals=model.get_travel_times(source_depth_in_km=depth,distance_in_degree=dist,phase_list=['P'])
# Arrivals are returned in time order 0 is always the first arrival
# This computes arrival time as an epoch time and shifts the data to put 0 at that time
a=arrivals[0]
atime=a.time
# Shift time 0 to the P wave arrival time
d.ator(otime+atime)
# Post the time used to Metadata
d['P_iasp91']=atime   # Illustrates a made up key for Metadata

3. Window Data

In [14]:
decon_twin=TimeWindow(awin_start,awin_end)
print(decon_twin.start,decon_twin.end)
print(d.t0,d.endtime())
print('sample interval=',d.dt,' and number of points=',d.npts)
d=WindowData(d,awin_start,awin_end)

-200.0 200.0
-300.0 300.0
sample interval= 0.025  and number of points= 24001


### Apply free surface transformation matrix

In [15]:
u=arrival_slowness_vector(a,azimuth)
d.free_surface_transformation(u,vp0,vs0)

### Apply deconvolution algorithm

This is a function pre-defined in mspass.
Supported algorithms:
1. LeastSquares
2. WaterLevel
3. MultiTaperXcor
4. MultiTaperSpecDiv
5. GeneralizedIterative

In [16]:
decondata=RFdecon(d,'MultiTaperXcor')

### Check the Seismogram is alive after processing

In [17]:
decondata.live

True

### Save the Seismogram after workflow

In [18]:
db.save_data(decondata, data_tag='example_output')
print('The seismogram is saved successfully or not: ', decondata.live)

The seismogram is saved successfully or not:  True


## Run a serial workflow with 10 Seismograms and measure performance

In [19]:
# the size of input seismograms
record_num = 10
cursor=db.wf_Seismogram.find({},limit=record_num)

t0=time.time()
nlive=0
normlist=['source','site']
for doc in cursor:
    d=db.read_data(doc,collection='wf_Seismogram',normalize=normlist)
    print('working on data for station=',d['sta'])
    # detrend
    detrend(d)
    # cosine taper ends
    dtaper=CosineTaper(d.t0,d.t0+dtaperlength,d.endtime()-dtaperlength,d.endtime())
    # bandpass filter
    filter(d,'bandpass',freqmax=fmax,freqmin=fmin)
    dtaper.apply(d)
    # Time windowing - variant of above example 
    stalat=d['site_lat']
    stalon=d['site_lon']
    srclat=d['source_lat']
    srclon=d['source_lon']
    depth=d['source_depth']
    otime=d['source_time']
    georesult=gps2dist_azimuth(srclat,srclon,stalat,stalon)
    # obspy's function we just called returns distance in m in element 0 of a tuple
    # their travel time calculator it is degrees so we need this conversion
    dist=kilometers2degrees(georesult[0]/1000.0)
    baz=georesult[2]  # gps2dist_azimuth returns back azimuth as 2 of tuple.  We need azimuth
    azimuth=baz+180.0
    if azimuth>360.0:
        azimuth -= 360.0
    if dist>95.0:
        d.kill()
        d.elog.log_error('session2_serial_script','No P wave - station is in the core shadow',ErrorSeverity.Invalid)
        print('Killed this datum - core shadow')
        db.save_data(d,data_tag='decon_output')
        continue
    arrivals=model.get_travel_times(source_depth_in_km=depth,distance_in_degree=dist,phase_list=['P'])
    # Arrivals are returned in time order 0 is always the first arrival
    # This computes arrival time as an epoch time and shifts the data to put 0 at that time
    a=arrivals[0]
    atime=a.time
    # Shift time 0 to the P wave arrival time
    d.ator(otime+atime)
    # Post the time used to Metadata
    d['P_iasp91']=atime   # Illustrates a made up key for Metadata
    decon_twin=TimeWindow(awin_start,awin_end)
    if decon_twin.start < d.t0:
        d.kill()
        d.elog.log_error('session_1_serial_script',
                         'Windowing failure - window start is before data starttime',ErrorSeverity.Invalid)
        print('killed this datum - windowing error')
        db.save_data(d,data_tag='decon_output')
    else:
        d=WindowData(d,awin_start,awin_end)
        # We transform the data to R,T,L using Kennett's free surface transformation matrix, which 
        # is implemented as a method in Seismogram
        u=arrival_slowness_vector(a,azimuth)
        d.free_surface_transformation(u,vp0,vs0)
        # run deconvolution
        decondata=RFdecon(d,'MultiTaperXcor')
        # save result with a different data tag - automatically will go to wf_Seismogram
        db.save_data(decondata, data_tag='decon_output')
        if decondata.live:
            nlive+=1
print('Total processing time=',time.time()-t0)
print('Number of live data save=',nlive)

working on data for station= N27M
working on data for station= N26I
working on data for station= N24I
working on data for station= N23I
working on data for station= N22I
working on data for station= N21M
working on data for station= W315
Killed this datum - core shadow
working on data for station= W31
Killed this datum - core shadow
working on data for station= W30
Killed this datum - core shadow
working on data for station= W29
Killed this datum - core shadow
Total processing time= 1.1130340099334717
Number of live data save= 6


## RF Estimation:  parallel job using Dask
MsPASS has support for two schedulers:  Dask and Spark.  In this exercise we are going to use Dask because it is slightly simpler to use.  In a later section we will talk about details of this job script, but for now a key point is to demonstrate that a job script to run a parallel job in MsPASS has only minor differences from the serial version.

We do have to make one point here to help you comprehend this job script;  a fundamental idea of both Spark and Dask is the idea of a map operator.  A map operator can be thought of as a function that takes a list of data objects (the dataset), does something to them, and creates a new list (dataset) of the modified data.  The schedulers handle the memory operations so the entire data set does not live in memory simultaneously. 

With that background, here is the above in parallel form (Note that for this notebook we could have dropped most of the initialization, but we retain it to emphasize the parallel structure):

### Example use for the map operation in mspass

In [20]:
import dask.bag as daskbag

def inc(x):
    return x + 1

daskclient = mp_client.get_scheduler()

total = 0
data_set = daskbag.from_sequence(range(100))
data_set = data_set.map(inc)
res = data_set.compute()
print(res)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]


### Parallel workflow with 10 seismograms

In [21]:
import time
import dask.bag
from dask.distributed import Client as DaskClient
from mspasspy.algorithms.RFdeconProcessor import RFdeconProcessor
from mspasspy.algorithms.RFdeconProcessor import RFdecon
from mspasspy.ccore.utility import AntelopePf
# These are repeated from above, but useful to make this box standalone so one can more 
# easily just cut and paste to use it in another workflow
from mspasspy.algorithms.window import WindowData
from mspasspy.algorithms.basic import cosine_taper, free_surface_transformation
from mspasspy.algorithms.signals import (filter, detrend)
from mspasspy.ccore.algorithms.basic import TimeWindow
from mspasspy.ccore.utility import ErrorSeverity
from mspasspy.db.database import read_distributed_data
from obspy.taup import TauPyModel
model = TauPyModel(model="iasp91")
from obspy.geodetics import gps2dist_azimuth, kilometers2degrees
normlist=['source','site']

# These initializations are identical to the serial version
# MsPASS allows parameters to be placed in a Antelope Pf format file.  We use 
# that here as an example of how to put parameters for a workflow in one place
pfhandle=AntelopePf('session2.pf')
# When using a pf to define constants always do that up front in case there are
# errors in the file
dtaperlength=pfhandle.get_double("data_taper_length")
fmax=pfhandle.get_double("filter_high_corner")
fmin=pfhandle.get_double("filter_low_corner")
awin_start=pfhandle.get_double("analysis_window_starttime")
awin_end=pfhandle.get_double("analysis_window_endtime")
vp0=pfhandle.get_double('vp0')
vs0=pfhandle.get_double('vs0')

# There is a fair amount of overhead to create the slepian tapers used in 
# the multitaper method.   We create an instance that defines the operator
# once and use it in the loop below
decon_operator=RFdeconProcessor(alg="MultiTaperXcor")

# initialize the dask client
daskclient = mp_client.get_scheduler()

record_num = 10
cursor=db.wf_Seismogram.find({"data_tag": "rawdata"}, limit=record_num)
t0=time.time()

# this script is identical to the serial script prior to this point.  
# Here is the first fundamental change:  our for loop is replaced by 
# this parallel reader that builds a Dask bag used to define the data set
dataset=read_distributed_data(db, cursor, normalize=normlist)
dataset=dataset.map(detrend)
# cosine_taper parameters here are randomly assigned for test
dataset=dataset.map(cosine_taper, 0.0, 30.0, 150.0, 180.0)
dataset=dataset.map(filter,'bandpass', freqmax=fmax, freqmin=fmin)
dataset=dataset.map(set_P_time, model)
dataset=dataset.map(WindowData, awin_start, awin_end)
# slowVector here is randomly created for test
dataset=dataset.map(apply_free_surface_transformation, vp0, vs0)
dataset=dataset.map(RFdecon, 'MultiTaperXcor')
dataset=dataset.map(db.save_data, collection='wf_Seismogram', data_tag='parallel_decon_output_10')
save_result = dataset.compute()
# number of seismogram saved
nlive = 0
for seis in save_result:
    if seis.live:
        nlive += 1

print('Total processing time for 10 seismograms=', time.time()-t0)
print('Number of live data save=',nlive)

Total processing time for 10 seismograms= 7.833009243011475
Number of live data save= 6


## Performance Analysis

### Parallel workflow with 100 seismograms

In [22]:
record_num = 100
cursor=db.wf_Seismogram.find({"data_tag": "rawdata"}, limit=record_num)
t0=time.time()
# this script is identical to the serial script prior to this point.  
# Here is the first fundamental change:  our for loop is replaced by 
# this parallel reader that builds a Dask bag used to define the data set
dataset=read_distributed_data(db, cursor, normalize=normlist)
dataset=dataset.map(detrend)
# cosine_taper parameters here are randomly assigned for test
dataset=dataset.map(cosine_taper, 0.0, 30.0, 150.0, 180.0)
dataset=dataset.map(filter,'bandpass', freqmax=fmax, freqmin=fmin)
dataset=dataset.map(set_P_time, model)
dataset=dataset.map(WindowData, awin_start, awin_end)
# slowVector here is randomly created for test
dataset=dataset.map(apply_free_surface_transformation, vp0, vs0)
dataset=dataset.map(RFdecon, 'MultiTaperXcor')
dataset=dataset.map(db.save_data, collection='wf_Seismogram', data_tag='parallel_decon_output_100')
save_result = dataset.compute()
# number of seismogram saved
nlive = 0
for seis in save_result:
    if seis.live:
        nlive += 1

print('Total processing time for 100 seismograms=', time.time()-t0)
print('Number of live data save=',nlive)

Total processing time for 100 seismograms= 10.004013299942017
Number of live data save= 81


### Parallel workflow with 1000 seismograms

In [23]:
record_num = 1000
cursor=db.wf_Seismogram.find({"data_tag": "rawdata"}, limit=record_num)
t0=time.time()
# this script is identical to the serial script prior to this point.  
# Here is the first fundamental change:  our for loop is replaced by 
# this parallel reader that builds a Dask bag used to define the data set
dataset=read_distributed_data(db, cursor, normalize=normlist)
dataset=dataset.map(detrend)
# cosine_taper parameters here are randomly assigned for test
dataset=dataset.map(cosine_taper, 0.0, 30.0, 150.0, 180.0)
dataset=dataset.map(filter,'bandpass', freqmax=fmax, freqmin=fmin)
dataset=dataset.map(set_P_time, model)
dataset=dataset.map(WindowData, awin_start, awin_end)
# slowVector here is randomly created for test
dataset=dataset.map(apply_free_surface_transformation, vp0, vs0)
dataset=dataset.map(RFdecon, 'MultiTaperXcor')
dataset=dataset.map(db.save_data, collection='wf_Seismogram', data_tag='parallel_decon_output_1000')
save_result = dataset.compute()
# number of seismogram saved
nlive = 0
for seis in save_result:
    if seis.live:
        nlive += 1

print('Total processing time for 1000 seismograms=', time.time()-t0)
print('Number of live data save=',nlive)

Total processing time for 1000 seismograms= 11.276261568069458
Number of live data save= 876


### Parallel workflow with 10000 seismograms

In [24]:
record_num = 10000
cursor=db.wf_Seismogram.find({}, limit=record_num)
t0=time.time()
# this script is identical to the serial script prior to this point.  
# Here is the first fundamental change:  our for loop is replaced by 
# this parallel reader that builds a Dask bag used to define the data set
dataset=read_distributed_data(db, cursor, normalize=normlist)
dataset=dataset.map(detrend)
# cosine_taper parameters here are randomly assigned for test
dataset=dataset.map(cosine_taper, 0.0, 30.0, 150.0, 180.0)
dataset=dataset.map(filter,'bandpass', freqmax=fmax, freqmin=fmin)
dataset=dataset.map(set_P_time, model)
dataset=dataset.map(WindowData, awin_start, awin_end)
# slowVector here is randomly created for test
dataset=dataset.map(apply_free_surface_transformation, vp0, vs0)
dataset=dataset.map(RFdecon, 'MultiTaperXcor')
dataset=dataset.map(db.save_data, collection='wf_Seismogram', data_tag='parallel_decon_output_10000')
save_result = dataset.compute()
# number of seismogram saved
nlive = 0
for seis in save_result:
    if seis.live:
        nlive += 1

print('Total processing time for 10000 seismograms=', time.time()-t0)
print('Number of live data save=',nlive)

Total processing time for 10000 seismograms= 101.30984115600586
Number of live data save= 8852
