# RSAM notebook

In this tutorial, we will explore Real-time Seismic Amplitude Measurement (RSAM) data. RSAM is, by definition, computed on raw seismic data, so we can also think of it as "Raw" Seismic Amplitude Measurement, to distinguish from similar measurements we will make later on velocity and displacement seismograms.

## 1. Background

### 1.1 Motivation

Imagine it is Spring 1985, and you are at the only Seismologist at the USGS Cascades Volcano Observatory (CVO). Tremor is appearing on the helical drum recorders, and has appeared before most of the explosive eruptions over the past 5 years. The authorities want to know if the tremor now is as strong as it was right before the catastrophic May 18, 1980 sector collapse. 

Volcano-seismic monitoring was simple, and largely consisted of:
1. Counting the number of earthquakes each day on the drum records ("daily counts")
2. Locating and mapping volcano-tectonic earthquakes, and estimating their magnitudes ("catalog production/analysis")
3. During heightened times of unrest, manning an Operations Room 24-7 with analysts continuously watching the drums, and communicating with field crews by 2-way radio ("real-time monitoring")

So all you have is the drum records (hundreds of large sheets of paper) and the catalog. You don't have any digital version of the continuous seismic data sitting on a hard drive, or on a CD. Why? 
- CD-ROM drives didn't appear until ~1990
- hard drive storage was too expensive. Here is a quick calculation:

In [None]:
# Algorithm to compute raw storage space needed for seismic data
def storage_space(samplingRate=100, bitsPerSample=32, numComponentsPerStation=3, numStations=10):
    BITS_PER_BYTE = 8
    SECONDS_PER_DAY = 60 * 60 * 24
    bytesPerGb = 1024**3
    gbPerDayPerChannel = (samplingRate * (bitsPerSample/BITS_PER_BYTE) * SECONDS_PER_DAY) / bytesPerGb
    gbPerDayNetwork = gbPerDayPerChannel * numComponentsPerStation * numStations
    print(f"Raw data requires {gbPerDayNetwork:.02f} GB of storage per day, and {gbPerDayNetwork * 365:.0f} GB per year")

    dollarsPerTB = {'1985':31400000, '2000':4070, '2023':14.3}
    print("\nStorage cost for 1 year of data, in different years:")
    for key in dollarsPerTB:  
        print(f"{key}: US${(gbPerDayNetwork * 365 * dollarsPerTB[key]/1024):,.0f}")

    print("Data from https://ourworldindata.org/grapher/historical-cost-of-computer-memory-and-storage")


storage_space(samplingRate=100, bitsPerSample=32, numComponentsPerStation=3, numStations=10)
    

So back in 1985, hard drive storage for just one year of data from the Mount St. Helens seismic network would have cost ~US$10 Million!
Given these costs, STA/LTA algorithms were used to capture anomalous signals - volcanic earthquakes - while the continuous data were generally discarded (or at best, recorded to tape).

Anyway, so you don't have an easy way to compare tremor levels. But you sure as hell aren't going to be caught in this situation again! So what can you do? <em><font color='green'>You can store a massively downsampled version of the continuous seismic data instead!</font></em>

This idea led to the Real-time Seismic Amplitude Measurement (RSAM) system.

### 1.2 Original RSAM system

The RSAM system was built around a 8-bit analog-to-digital-converter PC card: sofware was too slow in those days. Components of the original RSAM system were:

<font color='blue'>
<ol>
<li>Real-time bar graphs: showing average seismic amplitudes over last 2.56 s, 1 minute, and 10 minutes</li>
<li><b>1 minute and 10 minute mean signal amplitudes, logged to binary files. This is what most volcano-seismologists today think of as "RSAM data"!</b></li>
<li>"RSAM events": created by a simple STA/LTA detector running on each channel (NSLC)</li>
<li>Multi-station event (e.g. earthquake) and tremor alarm systems</li>
<li>Trends in RSAM data and other datasets (e.g. earthquake counts, tiltmeter data, gas flux, deformation, etc.) could be visualized with another software package called "BOB"</li>
</ol></font>

<table border=1><tr><td><img width=100% src="images/EndoMurray1991fig7.png" ></td><td>Fig 7 from Endo & Murray (1991). Top panel shows RSAM event rate at closest station to Pinatubo. Bottom 3 panels show RSAM data from stations at increasing distances. 30 days of data are show</td></table></tr></table>

In the figure above, 30 days of RSAM data are shown for three seismic stations. Loading and plotting 30 days of raw seismic data takes a while, but 1-minute RSAM data downsamples the raw seismic data by a factor of 6,000 (assuming a 100 Hz sampling rate), so long RSAM timeseries (hours, days, weeks, months, etc.) can be quickly loaded and plotted.

Reference:
- Endo, E.T., Murray, T. Real-time Seismic Amplitude Measurement (RSAM): a volcano monitoring and prediction tool. Bull Volcanol 53, 533–545 (1991).__[https://doi.org/10.1007/BF00298154](pdf/RSAM_EndoMurray1991.pdf)__

## 2. Computing RSAM data

### 2.1 Simple example

Here is a minimal example of computing RSAM data from an ObsPy Stream object. The data come from station REF at Redoubt Volcano in Alaska on 2009/03/22.

In [None]:
import os
import sys
import obspy
sys.path.append('../lib')
from SAM import RSAM
st = obspy.read(os.path.join('..','..','data','continuous','SDS','2009','AV','REF','EHZ.D', 'AV.REF..EHZ.D.2009.081' ))
st.plot();
rsamObj = RSAM(stream=st, sampling_interval=600)
rsamObj.plot()

In [None]:
help(rsamObj)

In [None]:
print(rsamObj)

In [None]:
print('RSAM dataframe for one Trace id (net.sta.loc.chan):')
print(rsamObj.dataframes['MV.MLGT..EHZ'].head())

There are just two columns, which are 'time', and 'mean'. 
- 'time' is in Unix epoch seconds (since 1970-01-01 00:00:00)
- 'mean' just holds the mean seismic amplitude within that 60-s time window (Sampling Interval=60.0s)

## 2.2 Non-trivial example



In [None]:
# Header
import os
import sys
import obspy
from obspy.clients.filesystem.sds import Client as sdsclient
sys.path.append('../lib')
import setup_paths
paths = setup_paths.paths
from SAM import RSAM

# Compute RSAM in 1-day chunks for multiple network-station-location-channel's
mySDSclient = sdsclient(paths['SDS_DIR'])
startTime = obspy.core.UTCDateTime(2001,2,27)
endTime = obspy.core.UTCDateTime(2001,3,3)
secondsPerDay = 60 * 60 * 24
numDays = (endTime-startTime)/secondsPerDay
daytime = startTime
while daytime < endTime:
    print(f'Loading Stream data for {daytime}')
    st = mySDSclient.get_waveforms("MV", "*", "*", "[SBEHCD]*", daytime, daytime+secondsPerDay)
    print(f'- got {len(st)} Trace ids')
    print(f'Computing RSAM metrics for {daytime}, and saving to pickle files')
    rsamMV24h = RSAM(stream=st, sampling_interval=60)
    #rsamMV24h.write(paths['SAM_DIR'])
    daytime += secondsPerDay
del mySDSclient

In [None]:
# Read all the RSAM data back, and plot
rsamObj = RSAM.read(startTime, endTime, SAM_DIR=paths['SAM_DIR'])
print(rsamObj)
rsamObj.plot(metrics='median')

## 3. RSAM data processing

We can process the data in various ways, e.g. using select(), downsample(), 

## 3. Legacy RSAM data 

### 3.1 Loading legacy RSAM data from binary files

The RSAM system was used at many observatories, and so many observatories have archives of RSAM binary files. But we can read these, making them Interoperable and Reusable. (Tiltmeter was saved in the same format, and so can also be read).

Next we will load 1 year of RSAM data for 8 stations recorded by the original RSAM system that was deployed in Montserrat. 


In [None]:
import os
import sys
import glob
import obspy
sys.path.append('../lib')
import setup_paths
paths = setup_paths.paths
from SAM import RSAM

stime = obspy.core.UTCDateTime(1997,1,1,0,0,0)
etime = obspy.core.UTCDateTime(1997,12,31,23,59,59)
files = glob.glob(os.path.join(paths['SAMBINARY_DIR'], f'M???{stime.year}.DAT'))
stations = list(set([os.path.basename(file)[0:4] for file in files]))
rsamObj = RSAM.readRSAMbinary(paths['SAMBINARY_DIR'], stations, stime, etime)
print(rsamObj)
rsamObj.plot()

### 3.2 Converting legacy RSAM binary files to modern RSAM CSV/Pickle files
Since we have already read the binary files into a (single) RSAM object, writing them to modern RSAM data format is as simple as:

In [None]:
rsamObj.write(paths['SAM_DIR'], ext='csv')

## 4. RSAM data processing and analysis

### 4.1 read and plot

Next we will:
- (re-)read (from disk) the RSAM data from 1996-02-15 to 1996-10-12 for select SEED ids
- plot the data. By default, the plot() method will convert RSAM dataframes into an ObsPy Stream object, so it can be plotted in a familiar way.

In [None]:
startt = obspy.core.UTCDateTime(1996,2,15)
endt = obspy.core.UTCDateTime(1996,10,12)
rsamObj = RSAM.read(startt, endt, trace_ids=['MV.MLGT..EHZ', 'MV.MGAT..EHZ', 'MV.MRYT..EHZ', 'MV.MGHZ..EHZ'], SAM_DIR=paths['SAM_DIR'], ext='csv')
rsamObj.plot()   
print(rsamObj)

These RSAM plots above show the following general features:
1. Low seismicity in February and March.
2. An increase in seismicity around April 1st persists throughout to June. This period included the first pyroclastic density current (PDC) that reached the ocean on May 12, 1996.
3. A more significant increase in activity about 2/3rds of the way through July 1996. This was a time period in which the seismicity and the lava dome extrusion rate significantly increased, leading to numerous PDCs that reached the ocean, and even travelled for some distance upon the water. The increase is particularly noticeable on MV.MLGT..EHZ (3rd trace) as this was close to the Tar River Valley, where most PDCs were directed.
4. A sharp drop in seismicity from September 18, 1996, onwards.

These features may be more obvious if we smooth the data, which we can do with the downsample() method:

### 4.2 Trim and Downsample 

In [None]:
startt = obspy.core.UTCDateTime(1996,7,15)
endt = obspy.core.UTCDateTime(1996,9,1)

# trim
rsamObj.trim(starttime=startt, endtime=endt)

# downsample
rsamObjHourly = rsamObj.downsample(new_sampling_interval=3600) 

# plot
rsamObjHourly.plot()

# print
print(rsamObjHourly)

In [None]:
There are various periods here where there seem to be cycles in RSAM. Let us look at early August period in more detail:

In [None]:
rsamObjSummer= rsamObj.copy()
rsamObjSummer.trim(starttime=obspy.core.UTCDateTime(1996,8,1), endtime=obspy.core.UTCDateTime(1996,8,8))
rsamObjSummer.plot(kind='stream', equal_scale=False) 

These are remarkable cycles in RSAM. They appear to be about 4-6 hours apart. This is a phenomenon called "banded tremor". During these tremor bands, visual observations indicated that the lava dome was extruding at particularly high rates (up to 20m^3 was one estimate I heard), and at the peak of each cycle there was often ash venting. I proposed that the tremor bands were indicated of pressure cycles within the conduit - but caused by what? 
One suggestion is that the magma rises up the conduit in a stick-slip fashion. Basically, it gets stuck for a while, as the pressure builds below, and then shear fractures, allowing magma to suddenly extrude very quickly. 

Can we use some ObsPy STA/LTA detection tools to detect these tremor bands, in the same way we normally detect much shorter transient events, but just with longer STA/LTA settings? Let us try first on a single NSLC. This is based on examples at https://docs.obspy.org/tutorial/code_snippets/trigger_tutorial.html, except we use longer STA and LTA time windows (15 and 100 minutes respectively), and we add a despiking step which attempts to remove transient events lasting a minute or less from the data before running the STA/LTA:


### 4.3 Tremor band detection with ObsPy trigger methods

#### 4.3.1 Single channel detection

In [None]:
from obspy.signal.trigger import plot_trigger, classic_sta_lta, recursive_sta_lta

rsamObjSummer.despike(metrics='all')
st = rsamObjSummer.to_stream()
st.trim(obspy.core.UTCDateTime(1996,8,3), obspy.core.UTCDateTime(1996,8,5))

sta_minutes = 15
lta_minutes = 100
threshON = 1.0
threshOFF = 0.3

cft = recursive_sta_lta(st[0].data, sta_minutes, lta_minutes)

plot_trigger(st[0], cft, threshON, threshOFF)

That seems to work quite well. Now let us try an event detector that uses several NSLC at once.

#### 4.3.2 Multi-channel detection

In [None]:
from obspy.signal.trigger import coincidence_trigger
from pprint import pprint
import numpy as np

threshStations = 3

trig = coincidence_trigger("recstalta", threshON, threshOFF, st, threshStations, sta=sta_minutes*60, lta=lta_minutes*60, max_trigger_length=2*lta_minutes*60, delete_long_trigger=True)

#pprint(trig)

lendata = len(st[0].data)
trdata = np.zeros( (lendata, ) )
detectionTrace = obspy.Trace( data = trdata ) 
detectionTrace.id = 'XX.DETEC..TED'
detectionTrace.stats.starttime = st[0].stats.starttime
detectionTrace.stats.sampling_rate = st[0].stats.sampling_rate
t = detectionTrace.times('utcdatetime')
for thistrig in trig:
    t0 = thistrig['time']
    t1 = (thistrig['time'] + thistrig['duration'])
    indices = np.where((t >= t0) & (t <= t1))
    #print(t0, t1, indices)
    detectionTrace.data[indices] = 1 #thistrig['duration']

st3 = st.copy()
st3.append(detectionTrace)
st3.plot(equal_scale=False);

detection_ON_times = [thistrig['time'].timestamp for thistrig in trig]
detection_intervals_minutes = np.diff(np.array(detection_ON_times))/60
for i,d in enumerate(detection_intervals_minutes):
    print(f"detection ON time for band {i}: {trig[i]['time']}, duration: {trig[i]['duration']/60} mins")
    print(f"- interval (mins): {detection_intervals_minutes[i]}")
print(f"detection ON time for band {i+1}: {trig[i+1]['time']}")

The bottom trace here corresponds to the detected events, and you can see they line up pretty well with the tremor bands, except the first one was missed.

This is similar to the banded tremor alarm system I wrote at MVO in 2000. And using this approach we can forecast the timing of the next tremor band. As it was the MVO Seismologist's job to manage the Operations Room, which included continuous seismic monitoring and two-way radio communications with MVO field crews, it was useful to predict tremor bands, as these were periods of heightened activity when field crews should not be on the flanks of the volcano.




In [None]:
# find peak value and peak time during each band
import pandas as pd
lod = []
for thistrig in trig:
    bandstarttime = thistrig['time']
    bandendtime = thistrig['time'] + thistrig['duration']
    bandTrace = maskedTrace.copy().trim(starttime=bandstarttime, endtime=bandendtime)
    bandpeaktime = bandstarttime + bandTrace.data.argmax() * tr.stats.delta
    band = {'starttime':bandstarttime, 'waxtime':bandpeaktime-bandstarttime, \
            'peaktime':bandpeaktime, 'wanetime':bandendtime-bandpeaktime, 'endtime':bandendtime, 'duration':thistrig['duration']}
    lod.append(band)


bandDf = pd.DataFrame(lod)
print(bandDf)

predicted = []
for col in ['starttime', 'peaktime', 'endtime']:
    interval = (bandDf.iloc[-1][col] - bandDf.iloc[0][col]) / (len(bandDf)-1) 
    predicted.append(bandDf.iloc[-1][col] + interval)
print('\nNext band prediction:')
print(' - start: ',predicted[0])
print(' - peak:  ',predicted[1])
print(' - end:   ',predicted[2])

In [None]:
st5 = rsamObjMidJuly.copy()
st5.trim(starttime = obspy.core.UTCDateTime(1996,8,5,0,0,0), endtime = obspy.core.UTCDateTime(1996,8,5,3,0,0) )
st5.plot()