# Welcome to the NoisePy SCEDC Colab Tutorial!

Noisepy is a python software package to process ambient seismic noise cross correlations. 

**Publication about this software**:
Chengxin Jiang, Marine A. Denolle; NoisePy: A New High‐Performance Python Tool for Ambient‐Noise Seismology. Seismological Research Letters 2020; 91 (3): 1853–1866. doi: https://doi.org/10.1785/0220190364



This tutorial will walk you through the basic steps of using NoisePy to compute ambient noise cross correlation functions using single instance workflow.

The data is stored on AWS S3 as the SCEDC Data Set: https://scedc.caltech.edu/data/getstarted-pds.html



First, we install the noisepy-seis package. ```NoisePy``` includes ```obspy``` as a depedency, therefore it is required to **restart runtime** after installation of noisepy.

In [None]:

! pip install --upgrade noisepy-seis



## Restart runtime

Required to properly install obspy on Colab instances.

## Import necessary modules

Then we import the basic modules

In [None]:
from noisepy.seis import cross_correlate, stack, plotting_modules       # noisepy core functions
from noisepy.seis.asdfstore import ASDFCCStore                          # Object to store ASDF data within noisepy
from noisepy.seis.scedc_s3store import SCEDCS3DataStore, channel_filter # Object to query SCEDC data from on S3
from noisepy.seis.datatypes import ConfigParameters                     # Main configuration object
from noisepy.seis.channelcatalog import XMLStationChannelCatalog        # Required stationXML handling object
import os
import glob


# create directory to store data locally
path = "./data"
# path = "../../data/" # for local runs
os.makedirs(path, exist_ok=True)
cc_data_path = os.path.join(path, "CCF")
stack_data_path = os.path.join(path, "STACK")

We will work with a single day worth of data on SCEDC. The continuous data is organized with a single day and channel per miniseed (https://scedc.caltech.edu/data/cloud.html). For this example, you can choose any year since 2002. We will just cross correlate a single day.

In [None]:
year = 2002     # year of analysis
doy = 2         # day of year (1-365)
# SCEDC S3 bucket common URL characters for that day.
S3_DATA = "s3://scedc-pds/continuous_waveforms/"+str(year)+"/"+str(year)+"_"+str(doy).zfill(3)+"/"  # 1 day of data
print(S3_DATA)

The station information, including the instrumental response, is stored as stationXML in the following bucket

In [None]:
S3_STATION_XML = "s3://scedc-pds/FDSNstationXML/CI/"            # S3 storage of stationXML


## Ambient Noise Project Configuration

We store the metadata information about the ambient noise cross correlation workflow in a ConfigParameters() object. We first initialize it, then we tune the parameters for this cross correlation.

In [None]:
# Initialize ambient noise workflow configuration
config = ConfigParameters() # default config parameters which can be customized

Customize the job parameters below:

In [None]:
config.dt= 0.05  # float: dt is constrained to be 1/sampling rate
# config.start_date: str(year)+"_"  # TODO: can we make this datetime?
# config.end_date: str = ""
config.samp_freq= 20  # (int) Sampling rate in Hz of desired processing (it can be different than the data sampling rate)
config.cc_len= 3600.0  # (float) basic unit of data length for fft (sec)
    # criteria for data selection
config.ncomp = 3  # 1 or 3 component data (needed to decide whether do rotation)


config.acorr_only = False  # only perform auto-correlation or not
config.xcorr_only = True  # only perform cross-correlation or not

# config.inc_hours = 24 # if the data is first 

 # pre-processing parameters
config.step= 1800.0  # (float) overlapping between each cc_len (sec)
config.stationxml= False  # station.XML file used to remove instrument response for SAC/miniseed data
config.rm_resp= "inv"  # select 'no' to not remove response and use 'inv' if you use the stationXML,'spectrum',
config.freqmin = 0.05
config.freqmax = 2.0
config.max_over_std  = 10  # threshold to remove window of bad signals: set it to 10*9 if prefer not to remove them

# TEMPORAL and SPECTRAL NORMALISATION
config.freq_norm= "rma"  # choose between "rma" for a soft whitenning or "no" for no whitening. Pure whitening is not implemented correctly at this point.
config.smoothspect_N = 10  # moving window length to smooth spectrum amplitude (points)
    # here, choose smoothspect_N for the case of a strict whitening (e.g., phase_only)

config.time_norm = "no"  # 'no' for no normalization, or 'rma', 'one_bit' for normalization in time domain,
    # TODO: change time_norm option from "no" to "None"
config.smooth_N= 10  # moving window length for time domain normalization if selected (points)

config.cc_method= "xcorr"  # 'xcorr' for pure cross correlation OR 'deconv' for deconvolution;
    # FOR "COHERENCY" PLEASE set freq_norm to "rma", time_norm to "no" and cc_method to "xcorr"

# OUTPUTS:
config.substack = True  # True = smaller stacks within the time chunk. False: it will stack over inc_hours
config.substack_len = config.cc_len  # how long to stack over (for monitoring purpose): need to be multiples of cc_len
    # if substack=True, substack_len=2*cc_len, then you pre-stack every 2 correlation windows.
    # for instance: substack=True, substack_len=cc_len means that you keep ALL of the correlations

config.maxlag= 200  # lags of cross-correlation to save (sec)
config.substack = True

In [None]:
# For this tutorial make sure the previous run is empty
!rm -rf ./data/CCF

## Step 1: Cross-correlation



In [None]:

stations = "SBC,RIO,DEV".split(",") # filter to these stations
catalog = XMLStationChannelCatalog(S3_STATION_XML)
raw_store = SCEDCS3DataStore(S3_DATA, catalog, channel_filter(stations, "BH")) # Store for reading raw data from S3 bucket
cc_store = ASDFCCStore(cc_data_path) # Store for writing CC data

# print the configuration parameters. Some are chosen by default but we can modify them
print(config)

In [None]:
ts = raw_store.get_timespans()
raw_store.get_channels(ts[0])

Perform the cross correlation
Here, removing the instrumental response is slow. It could also be the interpolation

In [None]:
cross_correlate(raw_store, config, cc_store)

Plot a single set of the cross correlation

In [None]:
file = os.path.join(cc_data_path, '2002_01_02_00_00_00T2002_01_03_00_00_00.h5')
plotting_modules.plot_substack_cc(file,0.1,1,200,False)

## Step 3: Stack the cross correlation

STILL NEEDS TO BE FIXED

Provide a path to where the data is.

In [None]:
stations = raw_store.get_station_list()
print(stations)
stack(stations, cc_data_path, stack_data_path, "linear")

Plot the stacks

In [None]:
print(os.listdir(cc_data_path))
print(os.listdir(stack_data_path))

In [None]:
files = glob.glob(os.path.join(stack_data_path, '**/*.h5'))
print(files)
plotting_modules.plot_all_moveout(files, 'Allstack_linear', 0.1, 0.2, 'ZZ', 1)