# IRIS PASSCAL - RT130 Data Processing: Step 2 - Generating parameter files

A Jupyter notebook by Glenn Thompson based on: https://www.passcal.nmt.edu/webfm_send/3035

You’ve offloaded a service run and have data from each RT130. Follow the steps in this document to convert the data to miniSEED and reorganize it into station/channel/day volumes. Then, create a stationXML for your experiment using Nexus (see step 7) before submitting data to PASSCAL. Program names are in italics. Unix commands and any command line arguments are on separate lines. Input files are denoted by < filename>. Additional documentation can be found on the PASSCAL website: https://www.passcal.nmt.edu/content/passive-source-seed-archiving-documentation

Import needed modules and set global variables

## STEP 1. Create an organized directory structure for your data. 
Start by creating a main directory for the project *(in this Jupyter Notebook, I use the variable REFTEKDIR for this main project directory)*. 

Under your main project directory, make a first level directory “SVC1” for service run number 1. For each subsequent service run create a new directory, e.g. SVC2, SVC3. Create directories in the SVC1 directory for the raw data files and log files. For example: 

    mkdir RAW
    mkdir LOGS

Move the raw data files (either .ZIP or CF folders) into the RAW directory, e.g.

    mv SVC1/ZIPFILES/*.ZIP SVC1/RAW/

<b>Glenn's variations:</b> While I sometimes used *Neo* to read the Compact Flash cards, compress the data to ZIP format, and then copy the data to the laptop, I mostly just copied the data using the MacOS command line as I found it quicker, e.g.

    cp -r /Volumes/UNTITLED/RT130-*/2* SVC1/RAW/
    
At the time of writing this Jupyter Notebook, I had already completed the field project and had all the data organized into a directory structure that looks like:

<pre>
    SVC1/
    RAW/
        2018288/
            AB13/
                0/
                1/
                9/
            9D7C/
                0/
                1/
                9/
            ...
        2018289/
                ...
        ...  
</pre>

In [None]:
import os
import glob
import shutil
from pathlib import Path
import numpy as np
import pandas as pd
import datetime
import obspy
import matplotlib.pyplot as plt
plt.rcParams['axes.formatter.useoffset'] = False # do not allow relative y-labels

#############################
####  STEP ONE FUNCTIONS ####
#############################
def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

def read_rt130_1_file(rt130file):
    #print('Processing %s' % rt130file, end="\r", flush=True)
    #print(rt130file)
    all_lats = [] 
    londeg = None
    latdeg = None
    output = os.popen('strings %s' % rt130file).read()
    if output: 

        #print(type(output))
        lines = output.split('\n')
        #print(len(lines))
        '''for i,line in enumerate(lines[0:3]):
            print(f"[{i}], {line}, [length={len(line)}]")
        '''

        lineindex = 1
        latindex = lines[lineindex].find('YABBBBBC        N ')
        if latindex == -1:
            for i, line in enumerate(lines):
                latindex = line.find('YABBBBBC        N ')
                if latindex > -1:
                    lineindex = i
                    break
                else:
                    latindex = line.find(' N ')
                    if latindex > -1:
                        lineindex = i
                        latindex -= 15
        if latindex > -1:
            latindex += 15
            try:
                lonindex = list(find_all(lines[lineindex][latindex:latindex+20], 'W'))[0]+latindex
                latdeg = float(lines[lineindex][latindex+3:latindex+5])
                latmin = float(lines[lineindex][latindex+5:latindex+11])
                latdeg = latdeg + latmin/60
                londeg = float(lines[lineindex][lonindex+1:lonindex+4])
                lonmin = float(lines[lineindex][lonindex+4:lonindex+10])
                londeg = -(londeg + lonmin/60)    
            except:
                print(lineindex, latindex, lines[lineindex]) 
        #print(latdeg, londeg)
        
        all_indexes = list(find_all(output, 'POSITION'))

        #print(all_indexes)

        #print('')

        
    
        for each_index in all_indexes:
            
            Ndeg = output[each_index+11:each_index+13]
            Nmin = output[each_index+14:each_index+16]
            Nsec = output[each_index+17:each_index+22]
            Ndecdeg = float(Ndeg) + float(Nmin)/60 + float(Nsec)/3600
            if Ndecdeg > 28.0 and Ndecdeg < 29.0:
                all_lats.append(Ndecdeg)

    #print(f"latdeg={latdeg}, londeg={londeg}, all_lats={all_lats}")
    return latdeg, londeg, all_lats  

def create_summary1file(SVCDIR, RAWDIR):
    """ Parse RT130 1/* files to reconstruct digitizer-GPS_Position history
    This generates a DataFrame/CSV file like:
    
    ,Digitizer,yyyyjjj,Latitude,LatSTD
    0,92B7,2018222,28.0595,2.8867513458681453e-05
    1,92B7,2018223,28.0595,1.8559214542252517e-05
    2,92B7,2018229,28.573463888888888,2.721655270429592e-06
    3,92B7,2018230,28.573469444444445,1.7899429988238652e-06
    4,92B7,2018231,28.573469444444445,2.6032870393506316e-06
    5,92B7,2018233,28.573469444444445,3.552713678800501e-15
    6,92B7,2018234,28.573469444444445,3.552713678800501e-15
    7,92B7,2018235,28.573469444444445,3.552713678800501e-15
    """
        
    positionDF = pd.DataFrame(columns = ['Digitizer', 'yyyyjjj', 'Latitude', 'LatSTD'])
    digitizerList = list()
    yyyyjjjList = list()
    latitudeList = list()
    latSTDList = list()
    longitudeList = list()
    lonSTDList = list()
    dayfullpaths = sorted(glob.glob('%s/20?????' % RAWDIR))

    summary1file = os.path.join(SVCDIR, 'summary1file.csv')
    if os.path.exists(summary1file):
        os.remove(summary1file)
    count = 0

    for thisdayfullpath in dayfullpaths:
        count = count + 1
        #print('Processing %s (%d of %d)' % (thisdayfullpath, count, len(dayfullpaths) ), end="\r", flush=True)
        thisdaydir = os.path.basename(thisdayfullpath) # a directory like 2018365
        digitizerpaths = sorted(glob.glob('%s/????' % thisdayfullpath))
        for digitizerpath in digitizerpaths:
            thisdigitizer = os.path.basename(digitizerpath)
            rt130files = sorted(glob.glob('%s/1/*' % digitizerpath))
            if rt130files:
                all_positions = list()
                latdegs = []
                londegs = []
                for rt130file in rt130files:
                    #if not rt130file == '/home/thompsong/work/PROJECTS/KSCpasscal/SVC02/RAW/2019013/91F8/1/090000000_0036EE80':
                    #    continue
                    #print('Processing %s' % rt130file, end="\r", flush=True)
                    latdeg, londeg, all_lats = read_rt130_1_file(rt130file)
                    if latdeg:
                        latdegs.append(latdeg)
                    if londeg:
                        londegs.append(londeg)
                    if all_lats:
                        if latdeg:
                            all_positions.append(latdeg)
                        all_positions.extend(all_lats)

                digitizerList.append(thisdigitizer)
                yyyyjjjList.append(thisdaydir)

                if all_positions:  
                    all_positions_np = np.array(all_positions)
                    #print(all_positions_np)
                    #print(np.median(all_positions_np))
                    #print('%s, %s, %f, %f\n' % (thisdigitizer, thisdaydir, np.median(all_positions_np), np.std(all_positions_np)) )

                    latitudeList.append(np.nanmedian(all_positions_np))
                    latSTDList.append(np.nanstd(all_positions_np))
                elif latdegs:
                    latitudeList.append(np.nanmedian(latdegs))
                    latSTDList.append(np.nanstd(latdegs))
                else:
                    latitudeList.append(None)
                    latSTDList.append(None)                    
                if londegs:
                    longitudeList.append(np.nanmedian(londegs))
                    lonSTDList.append(np.nanstd(londegs))
                else:
                    longitudeList.append(None)
                    lonSTDList.append(None)                    


    positionDF['Digitizer'] = digitizerList
    positionDF['yyyyjjj'] = yyyyjjjList
    if len(latitudeList)>0:
        positionDF['Latitude'] = latitudeList
        positionDF['LatSTD'] = latSTDList
    if len(longitudeList)>0:
        positionDF['Longitude'] = longitudeList
        positionDF['LonSTD'] = lonSTDList
        positionDF = positionDF.round(decimals=5)

    positionDF = positionDF.astype({'Digitizer': 'str', 'yyyyjjj': 'str'})
    positionDF.to_csv(summary1file, index=False)

    return summary1file, positionDF


def create_summary9file(SVCDIR, RAWDIR, positionDF):
    # Parse RT130 9/* files to reconstruct digitizer-station history
    summary9file = os.path.join(SVCDIR, 'summary9file.csv')
    lod = []
    
    STATIONS = ['BHP', 'TANK', 'FIRE', 'BCHH', 'DVEL', 'RBLAB']
    dayfullpaths = sorted(glob.glob('%s/20?????' % RAWDIR))
    if os.path.exists(summary9file):
        os.remove(summary9file)
    count = 0
    
    for thisdayfullpath in dayfullpaths:
        count = count + 1
        print('Processing %s (%d of %d)' % (thisdayfullpath, count, len(dayfullpaths) ))#, end="\r", flush=True)
        thisdaydir = os.path.basename(thisdayfullpath) # a directory like 2018365
        rt130files = sorted(glob.glob('%s/????/9/*' % thisdayfullpath))
        
        if rt130files:
            
            for rt130file in rt130files:
                output = os.popen('strings %s' % rt130file).read()
                if output: 
                    for station in STATIONS:
                        firstindex = output.find(station)
                        if firstindex > -1:
                            break
                        
                    if firstindex != -1:
                        pathparts = rt130file.split('/')
                        rt130 = pathparts[-3]
                        lod.append({'Digitizer':rt130, 'yyyyjjj':thisdaydir, 'Station':output[firstindex:firstindex+4].strip()})

    if lod:
        stationDF = pd.DataFrame(lod)
        stationDF = stationDF.astype({'Digitizer': 'str', 'yyyyjjj': 'str'})
        combinedDF = pd.merge(positionDF, stationDF, on=['Digitizer', 'yyyyjjj'], how="left")
        combinedDF.to_csv(summary9file, index=False)
        return summary9file, combinedDF
    else:
        return '', pd.DataFrame()

#############################
####  STEP TWO FUNCTIONS ####
#############################
def commandExists(command):
    # commandExists checks if PASSOFT/DMC commands are installed before we try to use them
    output = os.popen('which %s' % command).read()
    if output:
        return True
    else:
        print('Command %s not found.' % command)
        print('Make sure the PASSOFT tools are installed on this computer, and available on the $PATH')
        return False

def getpaths(SVCDIR):
    paths={}
    paths['RAWDIR'] = os.path.join(SVCDIR, 'RAW') 
    paths['LOGSDIR'] =  os.path.join(SVCDIR, 'LOGS')
    paths['CONFIGDIR'] = os.path.join(SVCDIR, 'CONFIG')
    paths['MSEEDDIR'] = os.path.join(SVCDIR, 'MSEED')
    paths['DAYSDIR'] = os.path.join(SVCDIR, 'DAYS')   
    return paths

def dirsmake(topdir, dirlist):
    print(topdir, dirlist)
    if not os.path.isdir(topdir):
        try:
            print(f'Attempting to make {topdir}')
            os.makedirs(topdir)
        except:
            print("%s does not exist. Exiting" % topdir)
            raise SystemExit("Killed!") 

    for thissubdir in dirlist:
        if not os.path.exists(thissubdir):
            print('Need to make %s' % thissubdir)
            os.mkdir(thissubdir)
            if not os.path.isdir(thissubdir):
                print("%s does not exist & could not be created. Exiting" % thissubdir)
                raise SystemExit("Killed!")         
#rt130file = os.path.join(REFTEKDIR, 'SVC06', 'RAW', '2019191', '92B7', '1', '000000000_0036EE80')
#read_rt130_1_file(rt130file)

In [None]:
# this is a fudge after writing all the code below to use all the data from the PASSCAL laptop backup, which seems the most complete

ALTREFTEKDIR = '/data/KSC/duringPASSCAL/PASSCAL_laptop_backup'
alldaysdir = os.path.join(ALTREFTEKDIR, 'REFTEK_DATA', 'SORTED')
REFTEKDIR = '/home/thompsong/work/PROJECTS/KSCpasscal'
svcalldir = os.path.join(REFTEKDIR, 'CONVERT', 'SVCall')
paths = getpaths(svcalldir)
dirlist=[]
for k in paths:
    dirlist.append(paths[k])
dirsmake(svcalldir, dirlist)
os.system(f"cp -rn {alldaysdir}/* {paths['RAWDIR']}")


In [None]:
os.system(f"mv {paths['RAWDIR']}/SORTED/* {paths['RAWDIR']}/")
os.system(f"rmdir {paths['RAWDIR']}/SORTED/")

## STEP 1. Create summary files and reconstruct digitizer-station-latitude versus time combinations. 


In [None]:
REFTEKDIR = '/home/thompsong/work/PROJECTS/KSCpasscal'

#ServiceRunDirs = sorted(glob.glob(os.path.join(REFTEKDIR, 'SVC??')))
ServiceRunDirs = sorted(glob.glob(os.path.join(REFTEKDIR, 'CONVERT', 'SVCall'))) # hack for 
print(ServiceRunDirs)
for SVCDIR in ServiceRunDirs:
    print('Setting paths for relative directories/files')
    paths = getpaths(SVCDIR)

    print('Creating outline directory structure')
    dirlist=[]
    for k in paths:
        dirlist.append(paths[k])
    dirsmake(REFTEKDIR, dirlist)
    print('Outline directory structure created/exists')

    # Create summary files from 1/* and 9/* files
    summary1file, positionDF = create_summary1file(SVCDIR, paths['RAWDIR'])
    if len(positionDF)>0:
        summary9file, combinedDF = create_summary9file(SVCDIR, paths['RAWDIR'], positionDF)
        if not combinedDF.empty:
            display(combinedDF)
            # march through the dataframe chronologically, finding any statistically significant changes in GPS coordinates from one day to another for same digitizer-station combo
        else:
            display(positionDF)
            # march through the dataframe chronologically, finding any changes from one day to another in digitizer-station combo


        

## STEP 2. Create the parameter file(s) and run rt2ms
The parameter file is used by *rt2ms* to assign header information to the miniSEED files. *rt2ms* is a PASSCAL program that generates miniSEED formatted files from REFTEK RT130 raw files. In addition, *rt2ms* also modifies the headers. In the SVC1 directory, use a text editor and information from your field notes to create an ASCII parameter file (parfile) following the examples at https://www.passcal.nmt.edu/webfm_send/3035.

<b>Glenn's variations</b>: PASSCAL instructions assume you construct a par file by hand for each network layout. However, I construct an Antelope-style *dbbuild_batch* pf file by hand for each network layout, and then use the PASSOFT program *batch2par* to convert this to a par file (in combination with 2 *sed* (Unix stream editor) commands to fix this. You can see *batch2par* and *sed* commands used below.

<b>Update 20241110</b>
Today I downloaded and installed the latest conda version of passoft3, and this no longer includes batch2par. Thus, I need to hand-edit the par files.
Furthermore, rt2ms has changed. It no longer has -F, -L or -Y command line arguments. I'm trying to work out how to use it - it seems to require a different folder organization, as it doesn't like multiple digitizer folders in the same YYYYJJJ directory.
And finally, the *.par format has changed. A blank 'location' columns is now needed, as in network.station.location.channel (SEED id). And a final 'implment_time' column is needed in UTCDateTime.isoformat() with 6-digit microseconds and perhaps a 'Z' on end.

In [None]:
import obspy
ServiceRunDirs = sorted(glob.glob(os.path.join(REFTEKDIR, 'SVC??')))
for SVCDIR in ServiceRunDirs:
    print('Setting paths for relative directories/files')
    paths = getpaths(SVCDIR)
    print(paths)
    #RT2MS_OUTPUT = os.path.join(LOGSDIR, 'rt2ms.out')
    
    # For each dbbuild_batch pf file, ensure there is a corresponding par file (create it if batch2par program is installed
    pffile = sorted(glob.glob('%s/locations*.pf' % paths['CONFIGDIR']))[0] # changed network to locations. assume there is only one file per SVCDIR
    parfile = pffile[:-2] + 'par' # 
    print('- %s, %s' % (pffile, parfile)) 
            
    # Create the corresponding parfile if it does not already exist
    if not os.path.exists(parfile): 
        if commandExists('batch2par'): 
            os.system("batch2par %s -m > %s" % (pffile, parfile))
            if os.path.exists(parfile): 
                # Edit the par file
                os.system("sed -i -e 's/rs200spsrs;/1;         /g' %s" % parfile)
                os.system("sed -i -e 's/x1/32/g' %s" % parfile);  
            else:
                print("- batch2par failed")
                raise SystemExit("Killed!")
        else:
            raise SystemExit("batch2par program is not installed")


    # Convert par file to new rt2ms compatible format with location and implement_time columns
    #parfile = sorted(glob.glob('%s/locations*.par' % CONFIGDIR))[0] # changed network to locations
    os.system(f"cat {parfile}")
    tmpparfile = parfile.replace('.par', '.par.reformatted')
    os.system(f'sed -e "s/[[:space:]]\+/ /g"  {parfile} > {tmpparfile}')
    print('removing comment\n')
    tmpparfile2 = parfile.replace('.par', '.par.uncommented')
    os.system(f'sed -e "1s/#//" {tmpparfile} > {tmpparfile2}')
    df = pd.read_csv(tmpparfile2, sep=';', index_col=None)
    df['location']=''
    datestr = os.path.basename(parfile).replace('locations','').replace('.par','')
    df['implement_time'] = obspy.UTCDateTime.strptime(datestr, '%Y%m%d').isoformat()+'.000000Z'
    newparfile = parfile.replace('.par', '.par.new')
    df = df[['das',' refchan', ' refstrm', ' netcode', ' station', 'location', ' channel', ' samplerate', ' gain', 'implement_time']]
    df.to_csv(newparfile, index=False, sep=';')
    os.system(f"sed -i 's/\;/\; /g; s/\;\s\+/\; /g' {newparfile}")
    os.system(f"cat {newparfile}")

### STEP 3: Convert your data into miniSEED files. 
In the service run directory, convert the raw RT130 data to miniSEED. Typing *rt2ms -h* shows a list of available options. 

If raw data is in decompressed folders, use the following commands: 

    ls -d SVC1/RAW/*.cf > file.lst rt2ms -F file.lst -Y -L -o MSEED/ -p <parfile> >& rt2ms.out 

The (-F) flag will process all files in the named list, (‐Y) puts the data in yearly directories, (-L) outputs .log and, if created, .err files, (‐o) creates an output directory, MSEED, and (‐p) points to your parfile. 

If raw data is in ZIP files: 

    rt2ms ‐D SVC1/RAW/ ‐Y ‐L -o MSEED/ ‐p <parfile> >& rt2ms.out 

The (‐D) flag will process all .ZIP files in a specified directory, instead of in a file list as in the previous example. 

When *rt2ms* finishes, move all of your log and .err files from the MSEED directory to the LOGS directory that you created in step 1. 

After running *rt2ms* the MSEED directory structure should look something like the example below. In the MSEED directory there will be .log files and possibly .err files along with a  subdirectory for each year that contains day directories for each stream.

<pre>
    MSEED/
    2014.019.21.29.16.98EZ.log
    2014.019.21.29.16.98EZ.err
    Y2014/
        R065.01/
        R065.02/
        R065.03/
</pre>

In [None]:
ServiceRunDirs = sorted(glob.glob(os.path.join(REFTEKDIR, 'SVC??')))
for SVCDIR in ServiceRunDirs:
    print('Setting paths for relative directories/files')
    paths = getpaths(SVCDIR)
    print(paths)
    #RT2MS_OUTPUT = os.path.join(LOGSDIR, 'rt2ms.out')

    NEWSVCDIR = os.path.join(REFTEKDIR, 'CONVERT', os.path.basename(SVCDIR))
    newpaths = getpaths(NEWSVCDIR)
    print('Creating outline directory structure')
    dirlist=[]
    for k in newpaths:
        dirlist.append(newpaths[k])
    dirsmake(NEWSVCDIR, dirlist)
    
    parfile = sorted(glob.glob(os.path.join(paths['CONFIGDIR'], '*.par.new')))[0]
    newparfile = os.path.join(newpaths['CONFIGDIR'], os.path.basename(parfile))
    os.system(f"cp {parfile} {newparfile}")
                           
    """ Build a temporary folder structure so that we can process it with new rt2ms which expects data straight of a CompactFlash card
    e.g. TEST/YYYYJJJ/datalogger, but for just one datalogger
    """
    summary1file = os.path.join(SVCDIR, 'summary1file.csv')
    if os.path.isfile(summary1file):
        positionDF = pd.read_csv(summary1file, index_col=None)
        allDigitizers = list(set(positionDF['Digitizer'].to_list()))
    else:
        allDigitizers = []
    for datalogger in allDigitizers:
        dayfullpaths = sorted(glob.glob('%s/20?????' % paths['RAWDIR']))
        CFDIR = os.path.join(newpaths['RAWDIR'], f'RT130-{datalogger}-1.cf')
        if not os.path.isdir(CFDIR):
            os.makedirs(CFDIR)
        
        for thisdayfullpath in dayfullpaths:
            print('Processing %s' % thisdayfullpath)
            thisdaydir = os.path.basename(thisdayfullpath) # a directory like 2018365
            dataloggerdir = os.path.join(thisdayfullpath, datalogger)
            outputdir = os.path.join(CFDIR, thisdaydir)
            print(dataloggerdir, '\t->\t',outputdir)
            if os.path.isdir(dataloggerdir):
                if not os.path.isdir(outputdir):
                    os.makedirs(outputdir)
                
                os.system(f"cp -rn {dataloggerdir} {outputdir}")

        """
        # run rt2ms
        RT2MS_OUTPUT = os.path.join(CFDIR, 'rt2ms.out')
        cmd = f"rt2ms -d {CFDIR} -p {newparfile} -o {newpaths['MSEEDDIR']} > {RT2MS_OUTPUT}"
        print(cmd)
        os.system(cmd)    

        # move all *.log files to the LOGS directory
        for src_file in Path(newpaths['MSEEDDIR']).glob('*.log'):
            shutil.copy(src_file, newpaths['LOGSDIR'])

        # move all *.err files to the LOGS directory
        for src_file in Path(newpaths['MSEEDDIR']).glob('*.err'):
            shutil.copy(src_file, newpaths['LOGSDIR'])  
        """

When *rt2ms* finishes, move all of your log and .err files from the MSEED directory to the LOGS directory that you created in step 1. 

After running *rt2ms* the MSEED directory structure should look something like the example below. In the MSEED directory there will be .log files and possibly .err files along with a  subdirectory for each year that contains day directories for each stream.

<b>We also want to track the digitizer/lat/lon/station changes from the summary9file</b>

In [None]:
ServiceRunDirs = sorted(glob.glob(os.path.join(REFTEKDIR, 'SVC??')))
summaryallcsv = os.path.join(REFTEKDIR, 'CONVERT', 'summaryall.csv')
lod = []
for SVCDIR in ServiceRunDirs:
    print('Setting paths for relative directories/files')
    paths = getpaths(SVCDIR)
    NEWSVCDIR = os.path.join(REFTEKDIR, 'CONVERT', os.path.basename(SVCDIR))
    newpaths = getpaths(NEWSVCDIR)
    
    # Create summary files from 1/* and 9/* files
    summary1file = os.path.join(SVCDIR, 'summary1file.csv')
    summary9file = os.path.join(SVCDIR, 'summary9file.csv')
    
    if os.path.isfile(summary9file):
        shutil.copy(summary9file, os.path.join(newpaths['CONFIGDIR'], 'summary9file.csv'))
        df = pd.read_csv(summary9file, index_col=None)
    else:
        shutil.copy(summary1file, os.path.join(newpaths['CONFIGDIR'], 'summary1file.csv'))
        df = pd.read_csv(summary1file, index_col=None)

    df.drop_duplicates(inplace=True)
    #print(df)
    if 'Station' in df.columns:
        df['Station'] = df['Station'].astype(str)
    for digitizer in df['Digitizer'].unique():
        subdf = df[df['Digitizer']==digitizer]  
        if not 'Station' in subdf.columns:
            print(NEWSVCDIR, digitizer, 'no stations')
        else:
            for i,row in subdf.iterrows(): # not doing anything
                if not isinstance(row['Station'], str) and row['Station'].isnull():
                    subdf.at[i, 'Station'] = 'Unknown'
                     
            for station in subdf['Station'].unique():
                stationdf = subdf[subdf['Station']==station]
            
                print(NEWSVCDIR, digitizer, station, stationdf.iloc[0]['yyyyjjj'], stationdf.iloc[-1]['yyyyjjj'], 
                      f"{stationdf['Latitude'].median():.5f}, {stationdf['Longitude'].median():.5f}, \
                      {stationdf['Latitude'].std():.5f}, {stationdf['Longitude'].std():.5f}")
                if not isinstance(station, str):
                    station = 'Unknown'
                thisdict = {'Digitizer':digitizer, 'Station':station, 'StartDate':stationdf.iloc[0]['yyyyjjj'], 
                            'EndDate':stationdf.iloc[-1]['yyyyjjj'], 
                            'MedianLatitude':f"{stationdf['Latitude'].median():.5f}", 
                            'MedianLongitude': f"{stationdf['Longitude'].median():.5f}"}            
                lod.append(thisdict)
            outdf = pd.DataFrame(lod)
            outdf.to_csv(summaryallcsv, index=False)

    """

    # Print the first and last dates of each digitizer-station combo
    output = os.popen('sort %s | uniq' % summary9file).read()
    #print(output)
    lastStationDigitizerCombo = ""
    lastline = ""
    for thisline in output.split('\n'):
        try:
            (station, yyyyjjj, digitizer)  = thisline.split(',')
            thisStationDigitizerCombo = '%s%s' % (station, digitizer)
            #print(lastStationDigitizerCombo, thisStationDigitizerCombo)
            if lastStationDigitizerCombo != thisStationDigitizerCombo:
                print(lastline)
                print(thisline)
            lastStationDigitizerCombo = thisStationDigitizerCombo
            lastline = thisline
        except:
            pass
    print(lastline)

    ####################################################################################################
    # Detect dates on which a digitizer was moved by examining plots of GPS Latitude & standard deviation.
    ####################################################################################################

    # Convert yyyyjjj strings to datetime objects
    dates = list()
    for thisyyyyjjj in positionDF['yyyyjjj']:
        thisdate = datetime.datetime.strptime(thisyyyyjjj, "%Y%j").date()
        dates.append(thisdate)
    positionDF['dates']=dates

    allDigitizers = list(set(positionDF['Digitizer'].to_list()))
    fig = plt.figure()
    ax = fig.add_subplot(111)
    for digitizer in allDigitizers:
        subsetDF = positionDF[positionDF['Digitizer']==digitizer]
        ax.plot_date(subsetDF['dates'], subsetDF['Latitude'],'.', label=digitizer)  
        
    #xt = ax.get_xticks()
    #ax.set_xticks()
    ax.set_xlim(dates[0],dates[-1])
    for xtl in ax.get_xticklabels():
        xtl.set_rotation(30)
        xtl.set_horizontalalignment('right')
    plt.legend()
    plt.title(os.path.basename(SVCDIR))
    ax.set_ylim(28.51, 28.58)
    plt.savefig('digitizer_lats.png')
    ax.set_ylim(28.572, 28.575)
    plt.savefig('digitizer_lats_zoomed.png')
    #plt.tight_layout()    
    """

<hr/>
(I have not reworked the following code yet)
<hr/>

## STEP 4: Reorganize the miniSEED data into station/channel/day volumes.

*dataselect* is a DMC program that allows for the extracting and sorting of miniSEED data (https://github.com/iris-edu/dataselect). This will read the data from the MSEED directory and convert them into day volumes with the required naming format: 

    dataselect -A DAYS/%s/%s.%n.%l.%c.%Y.%j MSEED/Y*/*/* 

The (-A) flag writes file names in the specified custom format. The format flags are (s) for station, (n) for netcode, (l) for location, (c) for channel name, (Y) for year, and (j) for Julian date. See the help menu for more details on options (*dataselect -h*). Depending on how much data you have, you may need to run *dataselect* in a loop that runs over the different days or stations in your experiment.

<b>Please note:</b> PASSCAL want data to be organized in BUD format. *dataselect -h* reveals that there is a BUD format option built in directly, so I attempt to use that here instead. This command is easier: 

    dataselect -BUD MSEED/Y*/*/* 
    
However, I do loop over year and day directories (as suggested) so that *dataselect* is not trying to deal with a file list that is too long for it or the operating system to handle.

(Or I could do this in ObsPy)

In [None]:
if commandExists('dataselect'): 
    # SCAFFOLD. I need to install this from github. And then figure out how to
    # substitute for station, netcode, location, channel name, year and julian day
    # May need to lop over different stations (I already do by day)
    mseedyearfullpaths = sorted(glob.glob('%s/Y20??' % MSEEDDIR))
    for thismseedyearfullpath in mseedyearfullpaths:
        print('Processing %s' % thismseedyearfullpath)  
        mseeddayfullpaths = sorted(glob.glob('%s/R*.01' % thismseedyearfullpath))
        for thismseeddayfullpath in mseeddayfullpaths:
            print('dataselect: Processing %s' % thismseeddayfullpath)
            #os.system('dataselect  -A %s/%s/%s.%n.%l.%c.%Y.%j  %s/*.m' % (DAYSDIR, thismseeddayfullpath))
            os.system('dataselect -v -BUD %s %s/*.m' % (DAYSDIR, thismseeddayfullpath) )

## STEP 5: Confirm your station and channel names
In the DAYS folder just created by dataselect, check to see if you have folders for each of your stations. The data should be organized into those folders in station/channel/day volumes named STA.NET.LOC.CHAN.YEAR.JULDAY. For example: BA01.XR..HHZ.2018.039 (The .. after XR is where the location code would be if needed).

If your parfile was incomplete (i.e. missing stations or channels), there will be one or more folders named with the RT130 serial numbers (e.g. 9306) instead of the desired station name (e.g. ME42). To change any miniSEED headers to correct a station name, network code, etc., see the *fixhdr* doc on the PASSCAL website (see link on the first page). After you have modified the headers with *fixhdr*, rename the files so that the station‐network‐location‐channel codes in the miniSEED file names match the corrected headers.  

## STEP 6. Perform quality control of waveforms and logs. 
Verify the data quality by reviewing the traces and log files (with *logpeek* and *pql*). Obvious signs of trouble include loss of GPS timing, overlaps, gaps, corrupted files, etc. Make a note of any problems. Use *fixhdr* to correct mark timing issues, and/or to convert the files to big endianess if they are not already. For more information on how to use these tools, refer to the appropriate documentation on the PASSCAL website (see link on the first page).

## STEP 7. Create metadata for your experiment. 
Use *Nexus* to generate a stationXML file for your experiment metadata. See the “Metadata Generation with Nexus in a Nutshell” document on the PASSCAL website (see link on first page).

## STEP 8. Send miniSEED data to PASSCAL. 
Please drop a note, with your PASSCAL project name in the subject, to <mailto>data_group@passcal.nmt.edu</mailto> before sending the data to PASSCAL so that we can set up a receiving area. Attach the stationXML created with Nexus to this email unless it is larger than 5Mb. Use our tool *data2passcal* to send the data: 

    data2passcal DAYS/ 

*data2passcal* will scan all subdirectories of the DAYS folder and send any miniSEED files that have the correct file names.
